bkpr first use after large migration issues: the key fixes#8618
Merged
sangbida merged 3 commits intoOct 20, 2025
Conversation
…own on large numbers of requests. Note that we create a destructor on the command to reset request->cmd pointer if the cmd is freed (so we know not to call the callback). But attaching hundreds of thousands of them is slow: it's a single-linked list, which is iterated in several places. But that's redundant: the request is now allocated off the cmd, so freeing the command will free the request anyway. Hacking in something to print progress to a file, here's the number of requests processed every 10 seconds before and after: Before: $ while sleep 10; do wc -l /tmp/bkpr-progress; done 181529 /tmp/bkpr-progress 195994 /tmp/bkpr-progress 207083 /tmp/bkpr-progress 226336 /tmp/bkpr-progress 234319 /tmp/bkpr-progress 241514 /tmp/bkpr-progress 247421 /tmp/bkpr-progress 255292 /tmp/bkpr-progress 261367 /tmp/bkpr-progress 269085 /tmp/bkpr-progress 276953 /tmp/bkpr-progress 282233 /tmp/bkpr-progress 286193 /tmp/bkpr-progress 290930 /tmp/bkpr-progress 295276 /tmp/bkpr-progress 301086 /tmp/bkpr-progress After: 169505 /tmp/bkpr-progress 196010 /tmp/bkpr-progress 219370 /tmp/bkpr-progress 235671 /tmp/bkpr-progress 244242 /tmp/bkpr-progress 255362 /tmp/bkpr-progress 265636 /tmp/bkpr-progress 276966 /tmp/bkpr-progress 284451 /tmp/bkpr-progress 288836 /tmp/bkpr-progress 296578 /tmp/bkpr-progress 304571 /tmp/bkpr-progress Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
c4b321a to
f04241b
Compare
This significantly speeds up the query which bookkeeper often does: "SELECT created_index" " FROM channelmoves" " WHERE payment_hash = X'%s'" " AND credit_msat = %"PRIu64 " AND created_index <= %"PRIu64, On large databases this scan is expensive, and a payment_hash index cuts it down a great deal. It does take longer to load the channelmoves in the first place though (about 3x). Before: $ while sleep 10; do wc -l /tmp/bkpr-progress; done 169505 /tmp/bkpr-progress 196010 /tmp/bkpr-progress 219370 /tmp/bkpr-progress 235671 /tmp/bkpr-progress 244242 /tmp/bkpr-progress 255362 /tmp/bkpr-progress 265636 /tmp/bkpr-progress 276966 /tmp/bkpr-progress 284451 /tmp/bkpr-progress 288836 /tmp/bkpr-progress 296578 /tmp/bkpr-progress 304571 /tmp/bkpr-progress After: $ while sleep 10; do wc -l /tmp/bkpr-progress; done 161421 /tmp/bkpr-progress 238273 /tmp/bkpr-progress 281185 /tmp/bkpr-progress 305787 /tmp/bkpr-progress Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: plugins: the sql plugin now keeps an index on `channelmoves` by `payment_hash`.
If we read all of them, we might get 1.6M at once (after initial migration). Then we submit a few hundred thousand simultaneous requests to lightningd, and it gets upset, queueing them all on the xpay command hook and running out of memory. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: plugins: bookkeeper first invocation after migration from prior to 25.09 with very large databases will not crash.
f04241b to
bd8c907
Compare
sangbida
approved these changes
Oct 19, 2025
Collaborator
sangbida
left a comment
There was a problem hiding this comment.
This looks good to me :) I did have a couple of questions:
- Do folks who have already migrated have to do anything extra (another migration?) to get the new index, does a migration just happen when a database change is made?
- Are the changes to the trace array not required after all?
- Is it worth writing a test for the read listchannelmoves 1000 entries at a time? Just wondering if there's a chance we could hit funny edge cases here in a case of a database rollback/deletion? I remember you mentioned that the cln database is not just appened only, not sure if that's entirely relevant here or not though.
Contributor
Author
In this case, there's no db migration as such. It's just bookkeeper processing those records now in the db (exposed via listchannelmoves).
Not for them: I checked, and HAVE_USDT=0 in their config.
In this case, it's actually good: I tested by hand using a real 1.6M entry db. But also, I have a series of real performance fixes which remove this requirement for the 1000 limit in the next major release. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: #8614
When you have 1.6 million channelmoves (migrated from accounts.db), the first call to any bookkeeper function will cause it to digest them all. This proved very slow, and eventually crashed the node.
These three fix this.